The Randomness Recycler: A New Technique for Perfect Sampling 
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Abstract 

For many probability distributions of interest, it is quite 
difficult to obtain samples efficiently. Often, Markov chains 
are employed to obtain approximately random samples from 
these distributions. The primary drawback to traditional 
Markov chain methods is that the mixing time of the chain 
is usually unknown, which makes it impossible to determine 
how close the output samples are to having the target dis- 
tribution. Here we present a new protocol, the randomness 
recycler (RR), that overcomes this difficulty. Unlike classi- 
cal Markov chain approaches, an RR-based algorithm cre- 
ates samples drawn exactly from the desired distribution. 
Other perfect sampling methods such as coupling from the 
past use existing Markov chains, but RR does not use the 
traditional Markov chain at all. While by no means univer- 
sally useful, RR does apply to a wide variety of problems. In 
restricted instances of certain problems, it gives the first ex- 
pected linear time algorithms for generating samples. Here 
we apply RR to self- organizing lists, the Ising model, ran- 
dom independent sets, random colorings, and the random 
cluster model. 



1 Introduction 

The Markov chain Monte Carlo (MCMC) approach to 
generating samples has enjoyed enormous success since its 
introduction, but in certain cases it is possible to do bet- 
ter The "randomness recycler" technique we introduce here 
(and whose name is explained in Section 0) works for a va- 
riety of problems without employing the traditional Markov 
chain. Our approach is faster in many cases, generating in 
particular the first algorithms that have expected running 
time linear in the size of the problem, under certain restric- 
tions. 
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In classical MCMC approaches, small random changes 
are made in the observation until the observation has nearly 
the stationary distribution of the chain. The Metropolis [O] 
and heat bath algorithms utilize the idea of reversibility to 
design chains with a stationary distribution matching the 
desired distribution. Unfortunately, this standard Markov 
chain approach does have problems. 

The samples generated by MCMC will not be drawn 
exactly from the stationary distribution, but only approxi- 
mately. Moreover, they will not be close to the stationary 
distribution until a number of steps larger than the mixing 
time of the chain have been taken. Often the mixing time is 
unknown, and so the quality of the sample is suspect. 

Recently, Propp and Wilson have shown how to avoid 
these problems using techniques such as coupling from the 
past (CFTP) [|l5|. For some chains, CFTP provides a pro- 
cedure that allows perfect samples to be drawn from the 
stationary distribution of the chain, without knowledge of 
the mixing time. However, CFTP and related approaches 
have drawbacks of their own. These algorithms are non- 
interruptible, which means that the user must commit to 
running such an algorithm for its entire (random) running 
time even though that time is not known in advance. Failure 
to do so can introduce bias into the sample. Other algo- 
rithms, such as FMMR [51], are interruptible (when time is 
measured in Markov chain steps), but require storage and 
subsequent rereading of random bits used by the algorithm. 
The method we will present is both interruptible and "read- 
once," with no storage of random bits needed. 

In addition, algorithms like CFTP and FMMR require an 
underlying Markov chain, and can never be faster than the 
mixing time of this underlying chain. Often these chains 
make changes to parts of the state where the state has al- 
ready been suitably randomized. This leads to wasted effort 
when running the algorithm that often adds a log factor to 
the running time of the algorithm. 

The randomness recycler (RR) is not like any of these 
perfect sampling algorithms. In fact, the RR approach aban- 
dons the traditional Markov chain entirely. This is what 
allows the algorithm in several cases to reach an expected 



running time that is linear, the first for several problems of 
interest. The RR technique gives interruptible, read-once 
perfect samples. 

In the next section we illustrate the randomness recycler 
for the problem of finding random independent sets of a 
graph. After this example we present in Section the gen- 
eral randomness recycler procedure and present a (partial) 
proof of correctness. In Section we present other applica- 
tions and in Section we review the results of applying our 
new approach to several different problems. 

2 Weighted Independent Sets 

We begin by showing how the randomness recycler tech- 
nique applies to the problem of generating a random inde- 
pendent set of a graph. This will illustrate the key features 
of RR and lay the groundwork for the more general proce- 
dure described in the next section. 

Recall that an independent set of a graph is a subset of 
vertices no two of which share an edge. We will represent 
an independent set as a coloring of the vertices from {0, 1}, 
denoted generically by x. Set x{v) — 1 if u is in the inde- 
pendent set, and x{v) = if w is not in the independent set. 
This implies that ^^ x{v) is the size of the independent set. 

We wish to sample from the distribution 



has the correct distribution over (the subgraph induced by) 
the vertices in Vt- 
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where A (called the fugacity) is a parameter of the prob- 
lem, and Zx is the normalizing constant needed to make tt 
a probability distribution. 

This distribution is known as the hard core gas model 
in statistical physics, and also has applications in stochastic 
loss networks [pl|. When A is large the sample tends to be 
a large independent set, and if A > 25/ A where A is the 
maximum degree of the graph, it is known that generating 
samples from this distribution cannot be done in polynomial 
time unless NP = RP '^. 

We will show that for A < 4/ (3 A — 4) the randomness 
recycler approach gives an algorithm with expected running 
time linear in the size of the graph, the first such result for 
this problem. 

The RR approach is to start not with the entire graph, 
but rather with a small graph where we can easily find an 
independent set from this distribution. For example, if a 
graph has only a single vertex, finding an independent set 
is easy. Starting from a single vertex, we attempt to add 
vertices to the graph, building up until we are back at our 
original problem. Sometimes we fail in our attempt to build 
up the graph, and indeed will then also need to remove ver- 
tices that we had previously added. The set Vt will comprise 
those vertices we have built up by the end of time step t. Af- 
ter step t, the vector xt will hold an independent set which 



Randomness Recycler for Independent Sets 

Set Vb ^ 0, xo ^ 0, i ^ 
Repeat 

Set xt+i ^ Xt 

Choose any v eV \Vt 

Set Vt+i ^VtU {v} 

Draw U uniformly at random from [0,1] 

If [/< 1/(1 + A) 

Letxt+i{v) ^ 
Else 
Let xt+i{v) ^ 1 

If a neighbor w of u has xt+i{w) — 1 
Let w be the lowest-numbered such neighbor 
Set xt+i{w) ^ 0, xt+iiv) ^ 
Remove from Vt+i the vertices v and w, 
all neighbors of w, and all neighbors of v 
with numbers less than that of w 
Sett^t + l 
Until Vt^V 



(In advance of running the algorithm, choose and fix a 
numbering of the vertices.) The algorithm proceeds induc- 
tively as follows. At the outset of step i + 1, we begin with 
an independent set xt of Vt chosen with the correct prob- 
ability. Then we choose a vertex v not in Vt to attempt to 
add. This vertex may be chosen in any fashion desired (ran- 
domly, or according to some fixed order, but not depend- 
ing on the independent set xt). Because the desired prob- 
ability of choosing an independent set x is proportional to 
\2^^{'")^ putting xt+i{v) — 1 has A times the weight of 
putting xt+i{v) = 0. Therefore we select xt+i{v) = 1 
with probability A/(l + A) and a;f+i(w) — with probabil- 
ity 1/(1 + A) (these are the heat bath probabilities). 

Unfortunately, the vector xt+i resulting from this selec- 
tion may fail to correspond to an independent set. At line 
1 1 of the pseudocode, we check whether some neighbor of v 
was already colored 1 (in the independent set). Note that we 
cannot simply remove v. Prior to the step, we knew that Xt 
was an independent set of Vf. If we observe that Xtiw) = 1 
for some lowest-numbered neighbor w of v, then xt is an 
independent set on Vt conditioned on this knowledge. 

Our solution is this: In line 14 we "undo" the knowledge 
gained by removing from Vt+i the vertices v and w, all the 
neighbors of w, and all the neighbors of v with number less 
than that ofw. On the remaining vertices of Vt+i, Xt+i will 
continue to be an independent set from the correct distribu- 
tion. We will say that an RR step of this type preserves the 
correct distribution. 

Note that although Vf+i is made smaller than Vt in the 
case of a conflict, we are able to salvage most of the vertices 



in Vt- In other words, we "recycle" the randomness built up 
in all of the vertices except v and w and some neighbors. 
This is where our approach gets its name, and "recycling" 
is the key new feature that enables us to contruct similar 
practicable algorithms for a wide variety of problems. 

We repeat until Vt — V. Because each step preserves the 
correct distribution, we know that xt will have the correct 
distribution tt at the end. This is proved formally in the next 
section; here we concentrate on bounding the running time 
of our procedure. 

Theorem 1 If X < 1/(2A — 1), then the expected running 
time of the above randomness recycling procedure for ran- 
dom independent sets is 0{n). 

A more careful statement of Theorem |l| is given follow- 
ing the proof. 

Proof We will show that for A this small, on average | Vt \ 
increases at each step. If U < 1/(1 + A), then the size 
of \Vt\ goes up by 1, but if U > 1/(1 + A), then the size 
of \Vt\ may decrease by at most 2 A — 1 [removing v (not 
previously included), w, and some neighbors]. Hence 

eIIv,hI-\v,\\v,,x,] > j-J^(i) - y:^(2A - 1) 

- T^|l-(2A-1)A1. 

which is positive precisely when A < 1/(2A — 1). Given 
an increase of IV4I on average at each step, standard mar- 
tingale stopping theorems (see, e.g., [Ml) show that after 
0{n) expected time the value of \Vt\ will be n, at which 
point Vt = V and the algorithm terminates. D 



More carefully, if 
1 



1 + A 



l-(2A-l)A]>7e(0,l), 



i.e., if 
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then the expected value of the running time T (measured by 
number of iterations of the Repeat loop) satisfies 

ET < n/7. 

Furthermore, a simple argument shows that the distribution 
of T has at worst geometrically thick tails: 

P{T>2'-^n) <2-"\ m = l,2, .... 

Several tricks may be used to either to improve our 
method or to improve our bounds on its performance. The 
first two we present work by altering the algorithm, and the 
third gives a better analysis. First we concentrate on making 



sure that as few vertices as possible are removed in the re- 
jection step. Note that we may assume that the graph is con- 
nected, since otherwise we simply work on each connected 
component separately. Therefore F \ Vo is connected, and 
by being slightly careful in how we choose v G F \ Vt, we 
can ensure that V \Vt remains connected at each step. In 
every step (except when |V^ \ Vt| = 1), the vertex v is not 
adjacent to A vertices in Vt, but only to at most A — 1, so 
fewer vertices are removed during rejection. Since the ver- 
tices removed from Vt in case of rejection are connected to 
V\Vt, V\Vt+i will also be connected whether we accept or 
reject, and no more than an extra constant amount of work 
is required at each step. 

The second alteration concerns how we look for the 
neighbor of v that is colored 1 in case of rejection. Instead 
of starting at the lowest numbered neighbor and working 
our way up, we start at a random neighbor and continue 
looking in cyclical order until we find our w colored 1 ; and 
then the vertices that we remove are w and its neighbors 
and V and its neighbors encountered in the search prior to 
finding w. On average (and together with the first trick), we 
need only look at < A/2 vertices in order to find w. 

Finally, in our analysis we kept track of a potential 
4>{Vt,Xt) = \Vt\ and showed that cj) increases on average. 
When we accept we sometimes add a vertex colored 1 to 
our set Vt; but when we reject, precisely one vertex col- 
ored 1 (namely, w) is removed. This suggests that we mod- 
ify (j) so that the acceptance and rejection phases both lead 
(in the worst case) to the same expected change in (f>. We 
will consider 

<j,{Vt,Xt)^\Vt\-aJ2xt{v) 

V 

and seek a suitable value of a. The expected change in (p 
if no neighbor of v is colored 1 is 1 — a [A/(l + A)]. If 
some neighbor is colored 1, then the expected change is at 

least 1 [1/(1 + A)] + (y^) [- (^) + a] . [The term 
(3 A — 2)/2 is an upper bound on the expected decrease 
in \Vt\, since (see above) on average we lose at most A/2 
neighbors of v and A — 1 neighbors ofw.] 

These two expressions may be made equal by setting 
a = 3A/4, and then the expected change in cj) will be posi- 
tive when 

A< ' 



3A-4 

Note that cf) at time equals 0, and can never be more 
than n, and we have shown that (p is expected to increase 
by a fixed positive amount at each step (when we formulate 
carefully as in the paragraph following the proof of Theo- 
rem |l]). This fact together with standard martingale stop- 
ping theorems can then be used to show that the expected 
time needed for | Vt | to equal n is at most linear in n. 



2.1 Markov chain approaches 

Several Markov chains for this problem exist Jl^ [E|, 
together with techniques for using CFTP to obtain per- 
fect samples [^ |0]. These Markov chains are known to 
mix in time 0(71 Inn), and the corresponding perfect sam- 
pling algorithms are known to run in time 0{n In n), when 
A < 2/(A — 2), which is alargerrange of Athan our method 
gives. However, when A is small enough, the 0{n) bound 
for our RR algorithm is smaller It is hoped that with fur- 
ther refinement of the rejection step, the range of A may be 
increased to where it matches the Markov chain analysis. 

3 The Randomness Recycler 

We now present a more general outline of the random- 
ness recycler technique. Many state spaces ft of interest are 
of the form i7 C C^, where C^ is the set of (proper or 
improper) colorings of a graph. Our goal is to sample from 
il in expected time linear in \V\. We have already seen how 
the independent sets of a graph may be encoded by color- 
ing a vertex 1 if it is in the indpendent set and otherwise. 
For another example, the set of permutations of n elements 
is a subset of {1, . . . , n}^^'-'"-\ Of course, the size of Q 
may be as large as |C|I^I, and this is in part what makes 
generating samples from these distributions difficult. 



The Randomness Recycler (Outline) 

Set Vo ^ 0, Xo ^ suitable xo,t^O 
Repeat 
Set Xt+i *- Xt 
Choose V eV\Vt 
Randomly choose color c for v 
Compute probability of accepting color c for v 
If we accept 
SetXt+i(w) ^c 
Set Vt+i ^VtU {v} 
Else 
Set Vt+i and Xt+i\v\Vt+i i" ^ way that 
'undoes' the effect of rejection 
Set < ^ i + 1 
Until Vt = V 



In an RR algorithm, a sample (i.e., one draw from tt) is 
built up one vertex of y at a time until we include all of the 
vertices. Let Vt be the subset of vertices on which we have 
akeady built up a sample at time t. On the vertices inV\Vt, 
the sample is fixed at some value, whereas on Vt, the sample 
is random, and drawn exactly from the desired distribution. 
Vt starts out empty, and at each step of the algorithm we 
attempt to add a vertex to Vt- Sometimes this is possible, 
and sometimes it is not. We continue in this fashion until 



Vt — V, at which point we have a sample drawn exactly 
from the desired distribution. Let Xt denote the coloring of 
the graph V at time t. 

The way in which we randomly choose c, compute the 
acceptance probability, and set Vt+i and Xt+i\v\Vt+i in 
case of rejection will all depend on the target distribution tt. 
What differentiates this algorithm from an elementary step- 
wise rejection approach is our rejection step. Rather than 
starting over when rejection is faced, we keep as much of 
Vt as possible, "recycling" the coloring on Vf+i. 

At each time step t we keep track of the vertex set Vt 
together with the colors that are fixed onV\Vt. The state 
Xt is random over Vt while on V^ \ Vt it is deterministic. 
Let Xf = {Vt, Xt\v\Vt)' '^^'^ foi" ^iiy possible value x* = 
{S, x\v\s) of ^t' l^t '^x' be TT conditionally given that the 
colors of y \ S* are as specified hy xly^g. 

To achieve both the desired distribution and interrupt- 
ibility, we want Xt to be random over Vt independent of the 
history X*, for t' < t. In other words we want the identity 

PiXt = Xt\X* =xl...,X:^ xl) = n^;{xt), (1) 

to hold. Indeed, if it does, then letting T denote the first 
time that Vt — V ,\\. follows easily that 

P{Xt = x\T = t) = Ti{x). 

Thus if (lit) is satisfied for all t, then at termination time T 
the RR algorithm returns a sample Xt that is distributed 
according to the desired distribution, and we have the inter- 
ruptibility property that T and Xt are independent random 
variables. 

Since Vb is empty, it is easy to begin with X^ from tTj;* . 
Let Ht := {Xq = x^, X^ = xl, . . . , X^ = Xt) for nota- 
tional convenience. We will say that step t+l preserves the 
correct distribution if 

P{Xt=xt\Ht)=T:x;{xt) 

P{Xt+i = xt+i\Ht+i) = Trx'^^^{xt+i). 

This requirement that RR preserve the correct distribu- 
tion is somewhat analogous to the design requirement that 
a Markov chain be reversible. It gives us a straightforward 
approach to designing an RR. 

Just as the heat bath approach gives a means for design- 
ing Markov chains that are reversible, it also gives us a 
method for designing RR algorithms that preserve the cor- 
rect distribution. For a specified vertex v ^ V and col- 
oring X, let TTv{-;x) denote the conditional probability dis- 
tribution of X{v) given that X\v\{v} = x\v\{v} when X 
has the stationary distribution tt. From current state x, the 
heat bath (or Gibbs sampler) Markov chain approach is to 
choose V uniformly at random and then choose a new color 
for V distributed according to tt^, (•; x). 



In heat bath RR, the vertex v is chosen any way the user 
desires from V\Vt, and then a new color is picked accord- 
ing to TTjj{-;x). However, this color is not always accepted. 
We compute the acceptance probability as follows, with the 
goal being to preserve the correct distribution. According 
to Theorem below, this goal is indeed met. 

Given values x^, Xt, x^^i, and Xf+i that correspond to 
a possible acceptance step in which vertex v is added to the 
growing vertex set, define p{Xf,Xt,Xf_^_i,Xt+i) to be the 
ratio 



p{x*^,Xt,x*^_^_J^,Xt+l) 



Also define 



TT^;^_^{Xt+l) 



ny{xt+i{v);xt)TTx;ixt) 



M(xj,a:j, i) := max p{x*,Xt,x*^j^,Xt+i). 

xt.xt+i 

Then the probability that we accept a possible tran- 
sition from {x^,xt) to {x^_^_l,xt+l) is taken to be 
p{x; , Xt, x;_^_l, xt+i)/M{x; , x;_^_l). 

We do not have to use the heat bath probabilities. It 
is also valid to use the same acceptance probability, with 
the distributions 7r„(-; x) replaced by arbitrary distributions 
Pu(-; x), when the distribution p„(-; x) is used to color a se- 
lected V when at a configuration x. 

While these acceptance probabilities may appear daunt- 
ing, for many problems they simplify considerably. For in- 
stance, in the independent set case, suppose first that v has 
no neighbor colored 1 . Then the heat bath probabilities are 
1/(1 + A) for color and A/(l + A) for color 1. The accep- 
tance probability in this first case will always be 1 . 

If instead some neighbor of w is colored 1 , then heat bath 
assigns probability 1 to the color 1. The acceptance prob- 
ability, however, works out to 1/(1 + A). Careful exami- 
nation of the independent set algorithm in Section g shows 
that this is exactly how the color for v is chosen, with the 
same acceptance probabilities. 

To show that the heat bath randomness recycler approach 
actually works (in general), we need to show that every step 
preserves the correct distribution. We will first consider ac- 
ceptance steps, for which the following lemma gives a suf- 
ficient condition. 

Lemma 1 Given possible values x*, x^^i, and Xt+i of X^, 
X^^i, and Xt+i corresponding to an acceptance step, sup- 
pose that only one value Xt of Xt has positive probability. 
If the bivariate process {X^,Xt)t>o evolves Markovianly 
and if for all such x^, xl_^-^, and Xt+i and the single Xt they 
determine we have 

-P(^t*+i = xl^^,Xt+i = Xt+i\Xt = x*t,Xt = xt)irxi{xt) 
= T^xi^^{xt+i)C, 

where C does not depend on Xt or Xt+i, then step t + 1 
preserves the correct distribution. 



Proof Let Ci := 1/P{X^^-^ — x*^-^^\Ht), and suppose 
that P{Xt = Xt\Ht) = Tixiixt). Let E be the event that 
X^^^ = Xt+i and Xt+i = Xt+i- Then 

P{Xt+i ^ xt+i\Ht+i) 

= CiF(Xj%i = x*fj^-i^, Xt+i = Xt+i\Ht) 

= CiP{Er\{Xt^Xt}\Ht) 

= CiP{Xt^Xt\Ht)P{E\Htn{Xt^xt}) 

= Cnrx'^{xt)P{E\X* = xl,Xt = Xt) 

= CiCTTx;^^{xt+i), 

where the last step is exactly our assumption. 

Note that neither Ci nor C depends on Xt+i- Hence, 
summing over Xt+i, 

l-CiC^7r,.^^(a;,+i) = CiC. 

xt+l 

This completes the proof. □ 

Theorem 2 The heat bath RR and arbitrary RR acceptance 
steps preserve the correct distribution. 

Proof The acceptance probabilities were chosen precisely 
to match the requirements of Lemma |l[ For instance, with 
heat bath RR, the left side of the equation in Lemma |l] 
equals 



'!ry{xt+i{v);xt) X 



TTx;^^{xt+l) 



T:v{xt+i{v); Xt)nx;{xt)M{xt , x^_^_i) 
X TT^iixt), 

which reduces to the right side of the equation with C = 
l/M{Xf, x^^-f^). The calculation for arbitrary RR is entirely 
similar. D 

Now we turn our attention to rejection steps. In design- 
ing an RR algorithm, it is our experience that proper han- 
dling of rejection steps to ensure preservation of the cor- 
rect distribution is more difficult and problem-specific to 
arrange than is proper handling of acceptance steps. But 
here are some broad guiding comments. 

Determination of the acceptance probability at step t+l 
will reveal knowledge about the colors of some subset, call 
it Dt, of Vt. If we reject, we then set Vt+i to be Vt \ Dt- 
This insures that when we reject, we do not bias the sample. 
That is, by removing Dt from Vt, we remove all traces of 
our knowledge gained, and as a result the remaining sample 
is drawn exactly from Try^^j^ . 

In the case of the independent sets, the set Dt consists 
of precisely those vertices prescribed to be removed by the 
algorithm; w and all its neighbors, and neighbors of v with 
numbers lower than that of w. [Indeed, all of these vertices 



are colored at time t, except for vertex w, which is col- 
ored 1.] It is not hard to check rigorously in this case that 
rejection steps also preserve the correct distribution, but we 
omit the details. 

4 Applications 

This section applies the randomness recycler approach 
to several different problems of interest. For some of these 
models we have theoretical bounds on the running time, 
while for others we have only experimental results. 

The Ising and Potts models In the Ising model, vertices 
in a graph (V, E) are colored from the set { — 1, 1}. The 
distribution n from which we wish to sample is defined by 



7r(a;) 



exp{-/3Ji/(x)} 



where 

H{x) := - ^ x{vi)x{v2) 

is known as the energy of the coloring, /3 is (proportional to) 
a postive parameter known as inverse temperature, and J 
is 1 in the ferromagnetic model and —1 in the antiferro- 
magnetic model. Generating approximate samples may be 
done in (nonlinear) polynomial time in the ferromagnetic 
case using Markov chain techniques of Jerrum and Sin- 
clair [^ [|i6|. 

The RR approach has provably linear expected running 
time for both the ferromagnetic and antiferromagnetic mod- 
els when f3 is small (i.e., the temperature is high). The set 
Dt to be removed from Vt in case of rejection is just the set 
of neighbors of the vertex v that we tried to add. Omitting 
details and proofs, we simply state the running time bound 
in the following theorem. 

Theorem 3 Let A be the maximum degree of the graph. If 



e^ <[l 



l/A 



then the expected running time of the heat bath RR proce- 
dure for the Ising model is 0{n). 

Comments like those following the proof of Theorem [l| 
apply here, where now the expected increase j^ [1 — (2 A — 
1)A] in \Vt\ becomes (A + l)e-^'^ - A. 

Although not needed for the theorem, in practice it helps 
to introduce a third color to supplement { — 1,1}. Notice 
that no edge with an endpoint colored contributes to H. 
At the completion of step t, every vertex in y \ Vt which is 
surrounded entirely by vertices in T^ \ V* may be recolored 
since this action does not affect the vertices in Vt at all. 



The Potts model differs from the Ising model in that 
more than two colors are used, but the energy depends (in 
a natural way) only on whether edges are colored concor- 
dantly or discordantly, and the running time Theorem re- 
mains valid verbatim. 



The Random Cluster Model The random cluster model 
is an extension of the Potts model to noninteger numbers of 
colors [^; this is discussed further below. Unlike our pre- 
vious examples, which colored vertices, the random cluster 
model colors edges of a given graph G = {V, E) with col- 
ors from {0, 1}. If A is the set of edges colored 1, then the 
distribution is 

4A):=pl^l(l-p)l^\^lg^(^)/Zp,„ AQE, 

where p S [0, 1]; g > is not necessarily an integer, and 
we shall assume q > \; c{A) is the number of connected 
components in the graph (V, A); and Zp g is a normalizing 
constant. 

The RR approach is as follows. We represent a set A C 
E hy a. binary vector x, by setting x{e) = 1 for e E A, 
and x{e) = otherwise. At each step, we keep track of 
such a vector xt and a set Et of edges, namely, the edges 
on which xt is random; all other edges will be colored 0. 
We choose an oriented edge e — (w,w) £ E \ Et, until 
such an edge e no longer exists. We set xt+i{e) = 1 with 
probability p, and xt+i (e) = with probability I — p. If v 
and w are already connected in xt [i.e., in the graph (V, At) 
where At = {e' : xt{e') = 1} C Et], then we accept 
the edge and set Et+i — Et U {e}. If v and w are not 
already connected, then we always accept a;f_|_i(e) = 0, but 
we accept xt-\-i{e) — 1 only with probability 1/q (since by 
adding this edge we reduce by 1 the number of connected 
components). 

When we reject, we know that v and w lie in separate 
components in {V,At). To counteract this knowledge, to 
form £'f+i we remove from Et all the edges in the com- 
ponent of {V,At) that contains w, together with all edges 
of Et that lead out of this component (and which therefore 
do not belong to At). 

We could cease our handling of a rejection step at this 
point and prove that (a) the algorithm works correctly and 
(b) Theorem H below holds (and the proof simplifies some- 
what) with the bound on p decreased to 

p<l/{A-{l/q)). 

However, we shall omit the formal proof of correctness and 
instead discuss a small (provably valid) trick which gains us 
some efficiency. 

Suppose that there are M vertices in the removed com- 
ponent. Consider the (connected!) graph consisting of the 



vertices and edges in this component, together with the ver- 
tex V and the edge {v,w}. Choose (in any fashion) a span- 
ning tree T of this graph; T will comprise ill + 1 vertices 
and therefore M edges. Add back all these M edges to get 
Et+i- Sample from the random cluster model on T, and 
add back in the edges thereby colored 1 to get Af+i. 

The key observation here is that it is elementary to sam- 
ple from the random cluster model when the graph is a 
tree. Indeed, then each edge independently is colored 1 
with probability p/{l — p + p) and with probability 
(1 — p)/{l — p + p), where 

p:=p/q. 

The random cluster model is an extension of the ferro- 
magnetic Ising and Potts models. When q> 1 is an integer, 
and p ~ 1 — cxp{— /3}, then samples from the random clus- 
ter model may be used to generate samples from the ferro- 
magnetic Potts model with q colors by independently taking 
each connected component of (V, A), uniformly choosing 
one of the q colors, and assigning to every vertex in the com- 
ponent that color For certain instances of the random clus- 
ter model, the heat bath Markov chain approach is believed 
from experimental evidence to be rapidly mixing [p^, but 
no theoretical rapid mixing results in the positive direction 
are known for any nontrivial instances of the problem. For 
some instances, the Markov chain approach is known not to 
be rapidly mixing [q|] . For the RR approach, we know that 
when p is small (corresponding to small /3), the approach 
takes an expected number of steps which is linear in the 
number of edges: 

Theorem 4 Suppose that 

A - (1/g) - V[A-(l/g)]2-4[l-(l/g)](A-l) 



P< 



2[l-(l/g)](A-l) 



Then the expected number of steps required by the RR algo- 
rithm is 0{\E\). 

For example, if A = 4 (as on a 2-dimensional rectangu- 
lar grid) and q = 2 (corresponding to the Ising model), then 
our restriction is that p < 1/3; this improves on the restric- 
tion p < 1/(A - (1/g)) = 2/7 obtained when the "add a 
tree" trick is not employed. 

Comments analogous to those following the proof of 
Theorem p] again apply. 

Proof We use a potential function that rewards us for 
adding edges and penalizes us for connecting components. 
Let 

^{Et,At):^\Et\-ac{At), 

where a will be determined later 

When the edge {v, w} we attempt to add to Et is be- 
tween two vertices already connected in At, then always 



goes up by 1, making this case uninteresting. It is when 
{v, w} would connect two previously unconnected compo- 
nents of At that the calculation becomes interesting. 

If the edge is chosen to be excluded from At+i, then 
increases by 1. If the edge is proposed to be included in 
At+i, then (j) changes by 1 — a if we accept. If we reject, 
we remove from At (and also from Et) a component of size 
M and (from Et) all of its adjacent edges. Not counting 
the edge {v,w} and making sure that we do not double- 
count, this totals atmost M(A — 1) edges removed from £'f . 
However, we add exactly Af — 1 new components to At 
by removing these edges. When we add the tree T back 
in, this produces M new edges for Et+i, but for each such 
edge there is a p/(l — p -\- p) chance of including the edge 
in At+i and thereby reducing the number of components 
by 1. Therefore, when we attempt to add {v,w} to At+i, 
but reject instead, the expected contribution to the change 
in (h is at least 



-M{A - 1) + M - 



M -1- M 



P 



1 -p + p, 

Now M may be very large (nearly as large as n), so we 
choose a in such a way that the coefficient of M in this 
expression vanishes. That is, we set 



a :- 



(A 



1 -P + P 
l-p 



and so the contribution in this case is bounded below by 
—a. 

We try to put the edge in with probability p and to leave 
it out with probability 1 — p. We accept an inclusion with 
probability 1/q. Putting everything together, we find that 
the expected change in <j) at any time step when v and w are 
not already connected in At is at least 



{l-p)+p 



1. 
-(1-a) 

q 



1 - - ) (-") 
q 



which is positive exactly when 

A - (l/q) - V[A-(l/g)]2-4[l-(l/g)](A-l) 



P< 



a 



2[l-(l/g)](A-l) 



In this case, the Markov chain approach does not have 
theoretical guarantees on the running time for any nontrivial 
value of p. While coupling from the past may also be used 
to generate perfect samples, there is no a priori bound on its 
running time. 

As with CFTP, we may still use the RR approach for val- 
ues of p for which no theoretical bound exists. We simply 
do not know beforehand how long the algorithm will take. 
Unlike CFTP, the RR approach is interruptible, so we may 
abort the procedure if it needs too many steps, without in- 
troducing bias into the sample. 



Proper colorings of a graph Finding the number of 
proper colorings of a graph is a #P-complete problem [^. 
Recall that a proper coloring of a graph assigns each ver- 
tex a color such that no edge has both endpoints colored the 
same color The ability to sample from the set of proper 
colorings leads to an approximation algorithm for counting 
the number of such colorings. 

Markov chain approaches require that k, the number of 
colors, be at least (11/6)A (where A is again the maxi- 
mum degree of the graph) | ]l9| ] in order to guarantee rapid 
mixing for the chain. Perfect sampling using bounding 
chains [0, |[] is only guaranteed to run in polynomial time 
when the number of colors is ri(A^). Unfortunately, the 
straighforward RR approach does not match these bounds. 
Somewhat roughly stated. 

Theorem 5 The heat bath RR approach to generating per- 
fect colorings requires only a linear expected number of 
steps when k is i7(A^). 

As with the bounding chain procedure, however, this al- 
gorithm may be run even when k is much smaller; we sim- 
ply have no reasonable a priori bound on the running time 
in such cases. 

The Move Ahead 1 chain Finally, we present a problem 
where an RR-based algorithm seems experimentally to run 
fast although we cannot give any theoretical bounds. In the 
list update problem, a set of items is kept in a list. To access 
an item, a user starts at the beginning of the list and steps 
through the items until the desired item is located. The lo- 
cated item may be replaced in the list anywhere between its 
current position and the front of the list, at fixed cost. The 
goal is to use a replacement strategy that keeps small the 
access times (i.e., item depths in the list) needed for items. 

Call the strategy which moves the accessed item to the 
front of the list the Move to Front (MTF) rule. A worst-case 
analysis shows that the MTF rule yields a 2-approximation 
for the optimal total access time for any sequence of item 
requests [nSI. Alternatively, it is useful to employ proba- 
bilistic models to describe how list items are chosen to be 
accessed. Commonly, such an access model will induce a 
Markov chain model on the evolution of the order of the list. 
Characteristics such as the limiting distribution as t -^ cxo 
of At, where At is the access time for the item accessed at 
time t, can then be estimated by drawing from the stationary 
distribution of the chain. 

To be specific, label the items with identification num- 
bers 1, . . . , n; suppose that at each time step, independently 
of previous time steps, any particular item i is accessed with 
probability pi > (independently of the order of the list); 
and suppose that after each selection, the accessed item is 
moved forward one rank in the list, i.e., is transposed with 
its predecessor in the list. (If the accessed item is already at 



the front of the list, the order of the list is left unchanged.) 
The self-organization rule we have described is called the 
Move Ahead 1 (MAI) rule. The limiting expected access 
time for MAI is known to be, for any access probability 
vector p, no more than that for MTF JlT] ] . Further Monte 
Carlo study of the limiting access time distribution is com- 
plicated by the fact that sampling from the limiting list- 
order distribution vr (for which a formula is known, but only 
up to a normalizing constant) seems to be quite difficult in 
general. 

Coupling from the past approaches to sampling from tt 
exist (Kp, but experimental evidence suggests that use of RR 
gives a faster algorithm. Suppose that pi oc r* for some 
ratio < r < 1. Then experimental evidence suggests 
that for each fixed value of r G (0, 1] the expected running 
time is linear in n, although the constant of linearity does 
varywith r. The Markov chain approach to this problem is 
only known to be rapidly mixing when r < 0.2 [|[. 

5 Conclusion 

The RR approach to perfect sampling gives exact sam- 
ples from difficult distributions without using the traditional 
Markov chain. It is quite different from other recent ap- 
proaches to perfect sampling such as coupling from the past. 

Because it dispenses with the Markov chain, the RR ap- 
proach yields, for restricted versions of some of these prob- 
lems, the first expected linear time algorithms for these 
problems. Even when the running time of RR is unknown, 
the algorithm may be run and the output will be guaranteed 
to come from the correct distribution. 

Unlike coupling from the past, RR is interruptible, so the 
user may set a time limit on the algorithm's running time (if 
measured in number of iterations of the basic Repeat loop) 
without introducing bias into the sample. Like read-once 
coupling from the past [E0|, this algorithm does not require 
storage of any random bits. (Another perfect sampling aj 
proach, that of Fill, Machida, Murdoch, and Rosenthal [[ 
is also interruptible but not read-once, and so does requires 
storage of random bits). We wish to stress that these ex- 
isting means for perfect sampling rely on finding a "good" 
Markov chain for the problem at hand. RR does away with 
the chain, and in doing so breaks the 0{n In n) barrier that 
has characterized so many of these problems. 

For independent sets and for proper colorings, the the- 
oretical bounds obtained apply only for a more restricted 
set of parameters than do those based on Markov chain 
approaches. However, when the appropriate restriction is 
met, our RR method is faster, yielding samples in a lin- 
ear (expected) number of steps. Moreover, much work has 
gone into analyses of Markov chains, while our work is still 
rather new, and we might hope with time and further effort 
eventually to match or even to relax the restrictions needed 



for the Markov chain approaches. For the Move Ahead 1 
chain we do not know any theoretical bounds on the run- 
ning time of our method. However, computer experiments 
show that for this problem the RR method works much bet- 
ter in practice than does the CFTP method. 

For the random cluster model, our RR technique is guar- 
anteed to run in a linear (expected) number of steps for a 
range of values of p. This is in sharp constrast to the Markov 
chain approach, where no polynomial running time bounds 
are known except in trivial cases. 

In summary, the randomness recycler is not applicable 
in all situations where Markov chain approaches are used, 
but RR often gives a fast read-once interruptible means for 
generating perfect samples that in restricted cases gives the 
first linear time algorithms for some difficult and important 
problems. 
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